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Abstract —With the rapid growth in multimedia services and 
the enormous offers of video contents in online social networks, 
users have difficulty in obtaining their interests. Therefore, var¬ 
ious personalized recommendation systems have been proposed. 
However, they ignore that the accelerated proliferation of social 
media data has led to the big data era, which has greatly impeded 
the process of video recommendation. In addition, none of them 
has considered both the privacy of users’ contexts (e,g., social 
status, ages and hobbies) and video service vendors’ repositories, 
which are extremely sensitive and of significant commercial 
value. To handle the problems, we propose a cloud-assisted 
differentially private video recommendation system based on 
distributed online learning. In our framework, service vendors 
are modeled as distributed cooperative learners, recommending 
videos according to user’s context, while simultaneously adapting 
the video-selection strategy based on user-click feedback to 
maximize total user clicks (reward). Considering the sparsity 
and heterogeneity of big social media data, we also propose a 
novel geometric differentially private model, which can greatly 
reduce the performance (recommendation accuracy) loss. Our 
simulation shows the proposed algorithms outperform other 
existing methods and keep a delicate balance between computing 
accuracy and privacy preserving level. 

Index Terms —Online social networks, multimedia big data, 
video recommendation, distributed online learning, differential 
privacy, media cloud. 

1. Introduction 

I N recent years, online social networks (OSNs) have been 
massively growing, where users can share and consume all 
kinds of multimedia contents. As a result, given the numerous 
different genres of videos in social media, how to discover the 
videos of personal interest and recommend them to individual 
users are of great significance. Recommendation is foreseen 
to be one of the most important services that can provide 
such personalized multimedia contents to users d. Several 
companies have demonstrated initial successes in multimedia 
recommendation system design, d reported that YouTube 
won its first Emmy for video recommendations. Actually, 
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most OSNs recommend video content to their users based 
on the user’s rich context information (e.g., social status, 
ages, professions, health conditions and hobbies) contained 
in their released multimedia data. Regarding this way, several 
recommendation systems have been proposed 1^ . (241. 

However, there exist two major challenges in this scenario. 
The first challenge comes from the big data’s role in the 
personalized recommendation. In detail, OSNs have accel¬ 
erated the popularity of applications and services, resulting 
in the explosive increase of social multimedia data. In this 
case, multimedia big data puts companies in a favorite po¬ 
sition to have access to much more contextual information 
a. However, how to harness and actually use big data to 
effectively personalize recommendation is a monumental task. 
Traditional stand-alone multimedia systems cannot handle the 
storage and processing of this large-scale datasets (71. Besides 
that, complex and various user-generated multimedia big data 
in the OSNs results in the sparsity and heterogeneity of users’ 
context data. Hence, it is extremely challenging to implement 
recommendation with the multimedia big data. 

Furthermore, the privacy in recommendation has raised 
widely concern. On the one hand, as declared in (5l, user’s sen¬ 
sitive context information may be exposed by the recommen¬ 
dation results. Intuitively, the more detailed the information 
related to the user is, the more accurate the recommendations 
for the user are. But once the recommendation records are 
accessed by a malicious third party, individual features can 
be inferred by them merely based on the outcome of the 
recommendation. For example, advertising video of luxury 
goods recommended to a particular person indicate the income 
level of this user. Also basketball video recommendation for 
the same user exposed it’s hobby. Then with additional side 
information, the malicious party may identify the person in 
real life. On the other hand, the inventory of videos is an 
important commercial secret for the service vendor. As for the 
service vendors’ incentives, they rely on stored video source 
files to gain popularity among users. Intuitively, video service 
vendors are selfish and they refuse the inference of what they 
have in the inventories by the revenue gain of each video. 
Consequently, avoiding the divulge of video contents of each 
service vendor is desirable. 

Taking the above two difficulties into consideration, es¬ 
tablishing a privacy-preserving video recommendation sys¬ 
tem with multimedia bid data can be extremely challenging. 
Traditional recommender systems for multimedia, including 
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collaborative filtering (CF) lITOl and content-based (CB) rec¬ 
ommendation ii can provide meaningful multimedia recom¬ 
mendations at an individual level. However, their stand-alone 
systems have difficulties in dealing with tremendous high¬ 
dimensional multimedia big data. As for the privacy concern 
in recommendation, previously, anonymity was the main tool 
in recommendation mui. But the fact that the information can 
only be partially removed will allow for re-identification. 

Differential privacy ifT^ proposed recently is a heuristic 
method to solve this problem. Informally, differential privacy 
means that the output is going to be almost exactly the 
same whether it includes a single user’s data in the input 
datasets. Therefore, hardly can one make an accurate infer¬ 
ence on signal user’s feature based on the recommendation 
results. Besides, adding laplace noise into the recommendation 
rewards can hide small changes that arise from a single 
video’s contribution. Thus, the revenue gain of one signal 
video cannot be deduced. Several studies have incorporated it 
into recommendation systems ina, M, but their works only 
focus on small-scale media datasets, yet executing differential 
privacy in a large datasets often impacts little on accuracy, 
which works extremely efficiently under the big data context 
In conclusion, it is necessary to design a privacy-preserving 
video recommendation that can handle the multimedia big data 
and achieve high-accurate recommendations. 

In this paper, we introduce differential privacy into dis¬ 
tributed online learning to design an efficient and high-accurate 
timely recommendation system based on multimedia cloud 
computing ca As illustrated in Fig. 1, user-generated mul¬ 
timedia big data (e.g, images, audio clips and videos) is first 
translated to remote media cloud and stored in decentralized 
data centers (DCs). Then use technologies such like Bag-of- 
Features Tagging (BoFT) O to extract user’s context vectors 
and convert the results to distributed video service vendors 
(servers). Finally recommended video contents are pushed to 
multimedia applications in OSNs. 

Our main theme in this media cloud based scenario is that 
video service vendors are modeled as decentralized online 
learners, who try to learn from user’s high-dimensional context 
data and match it to the optimal video. The service vendors 
are connected together via a fixed network over the media 
cloud, each of whom experience infiows of users’ context 
vectors to them. If service vendors cannot find suitable videos 
in their repositories for the coming user, they can forward 
the use’s context data to neighbor service vendors, who will 
find out the suitable video in his repository to recommend 
to this user. At the end of each time slot, the reward of 
the recommended video is observed. Service vendors can 
learn from the result and adjust their selection strategy next 
time. Since the extracted context vectors from multimedia big 
data are high-dimensional and omnifarious, the context space 
with d dimensions (d is the number of user features) can be 
extremely huge and heterogeneous. Then, learning the most 
matchable video for each individual can be extremely slow. 
Therefore, each service vendor initially groups users (partition 
the context space) with similar context into rough crowds, and 
then they dynamically refine the partition strategies over time. 

To goal of each service vendor is to maximize its long term 



Fig. 1: A general illustration of multimedia cloud based video 
recommendation system. 

expected total recommendation reward and do not want to 
reveal their repositories to other service vendors. However, in 
the cooperation, each service vendor will share some informa¬ 
tion such as the user’s context vectors and the videos’ revenue 
gains with neighbor service vendors. Then, service vendors can 
infer the repositories of other service vendors from the shared 
information. To solve this privacy leakage, we adopt Laplace 
mechanism Cl, adding noise to shared revenue gains. As for 
the users’ privacy, to prevent the exposure of their feature by 
the recommendation videos, adding noise to the revenue gains 
is not noneffective. Because the gain is produced after the 
recommended video is revealed and disturbing the accurate 
estimation of gains of their own videos with this noise is not 
necessary. Thus, we employ exponential mechanism CSl to 
protect the users’ privacy, where the service vendors randomly 
select the video according a computed exponential probabili¬ 
ties. Faced with the fact that user’s contexts (d-dimensional 
point in the context space) are sparse distributed over the 
context space, we propose a novel geometric differentially 
private method to promote the total reward. This paper makes 
the following contributions: 

• We propose a media cloud based video recommendation 
system and rigorously formulate it as a distributed online 
learning problem. In our model, decentralized service 
vendors work cooperatively to deal with large-scale con¬ 
textual data. 

• To handle the dimensionality and sparsity of the mul¬ 
timedia big data, our method adaptively partitions the 
context space for each service vendor. Our evaluation 
results show this method has lower performance loss and 
converges fast to optimal strategy. 

• To the best of our knowledge, we are the first to deal 
with the privacy issue of both the social media users and 
video service vendors in recommendation. We integrate 
exponential mechanism and Laplace mechanism simulta¬ 
neously into distributed learning systems. We guarantee 
^-differential privacy while not coming at substantial 
expense in total reward. 

• We propose a “geometric differentially private model” to 
deal with the sparse contextual data, which can reduce 
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the performance loss extensively. 

The remainder of the paper is organized as follows. In 
Section II, we briefly review the related work. Section III 
presents the necessary background concepts of this work. In 
Section IV, we detail the system model, define our perfor¬ 
mance metric, the adversary model and design goals. Section 
V describes the design of algorithms and provides theoretical 
analysis of the performances. In section VI, we present our 
geometric differentially private model. Section VII discusses 
our experimental results and analysis. Section VIII concludes 
this paper. 

II. Related Work 

Several recommendation algorithms have been exploited 
in the past. Content-based filtering (CB) recommendation 
systems ifTTll-llTl focus on the similarities of content titles, 
tags and descriptions and they find user-interested items based 
on user’s individual reading history. CB recommender systems 
are easy to deploy. Nonetheless, simply representing the users 
profile information by a bag of words is not sufficient to 
capture the exact interests of the user. Collaborative filtering 
(CF) recommendation systems (201, (^ rely on abundant user 
transaction histories and content popularity. CF systems re¬ 
quire enough history consumption record and feedback, which 
is not suitable to real-time recommendation. Graph-based (GB) 
recommendation systems (221 . (23l build a graph to calculate 
the correlation between recommendation objects. Then, rec¬ 
ommendation problem turns into a node selection problem on a 
graph. Besides that, users cotagging behaviors and friendships 
in social network are described by a graph. Combining graph 
theory with recommendation is a marvellous idea. However, in 
OSNs, this graph can be continuously changeable. Construct¬ 
ing and storing such graph are impractical. Context-aware 
recommendation systems make recommendation based on the 
contextual information both of items and users. m has done 
a pioneering in this area, but its centralized framework fails 
to satisfy the need of big data environment. Our distributed 
cooperative recommendation framework can arrange recom¬ 
mendation timely under big data environment and provides 
rigorous performance guarantees. 

As for the privacy in recommendation systems, anonymity 
was the main tool CB. However, especially for rich, high¬ 
dimensional big data, most anonymization techniques appear 
to cripple the utility of the data (25l . (26l . In addition, though 
anonymized, users may be re-identification in the presence of 
colluding adversaries or those with auxiliary information . 
On the other hand, prior works lay emphasis on cryptography 
GB to make the privacy-sensitive data inaccessible to any 
outsiders and the server by means of encryption. But it usually 
incurs high computation and communication overheads. Dif¬ 
ferential privacy ifT^ proposed in recent years has been incor¬ 
porated into recommendation by several studies. McSherry and 
Mironov ca show how to adapt the leading algorithms used 
in the Netfiix Prize competition to make privacy-preserving 
recommendations. This is typically accomplished by adding 
noise to the item covariance matrix, to hide small changes that 
arise from a single users contribution. Ashwin et al. (H) and 
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TABLE I: Comparison with prior work in recommender sys- 
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Jorgensen ifTSll combine differential privacy with social graph 
for recommendation. But their work only study the privacy 
of sensitive user-item preferences and connections between 
people, rather than individual features. Our work aims at the 
privacy of individual features contained in their context data 
and the secrecy of service vendors’ data. 

III. Backgrounds 
A. Differential Privacy 

The concept of differential privacy is originally introduced 
by Dwork (121, which gives us a riorous definition of privacy. 

Definition 1 (Differential Privacy (T^ ). A randomized 
algorithm M has e differential privacy if for any two input 
sets A and B with a single input difference, and for any set 
of outcomes R G Range{M), 

F[M{A) eR]< exp(5) x F[M{B) G R]. 

Informally, differential privacy means that the outcome of 
two nearly identical input datasets (different for a single 
component) should also be nearly identical. Thus, attacker is 
not able to get the information of the individual’s information 
by comparing the query result of A and B. In our model, the 
input datasets are users’ context vectors. The privacy 5 is the 
parameter to measure the privacy level of the algorithm. The 
choice of 5 is a trade-off between the privacy and the accuracy 
of the output. 

One effective tool is the Laplace Mechanism m, i-e., 
M{x) = f{x)+Lap{^). In this way, /() is a counting query 
on the data set X, and LapQ is the Laplace distribution with 
standard deviation to scale the counting query result. 

Definition 2 (Sensitivity of Laplace mechanism (341 ). The 
sensitivity of a function f is: 

Af = max||/(x) - /(^)||i, (1) 

where x and y are input datasets differ on at most one compo¬ 
nent. The sensitivity of a function / captures the magnitude, 
by which a single component can change the function / in the 
worst case. Indeed, the sensitivity of a function gives an upper 
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bound on how much we must perturb its output to preserve 
privacy. 

Corollary 1 (Composability |[T^ ). The sequential applica¬ 
tion of randomized computation Mi, each giving e\ differential 
privacy, yields ei differential privacy. 

Referring to differential privacy, another powerful tool is 
the exponential mechanism ||T6|. The exponential mechanism 
Me{x,u,r) selects and outputs an element r e R with 
probability proportional to exp(^^^;^). Here, x is the input 
data set we want to protect, r is the output of the mechanism 
and u{x^k) is the unity function. There is also a definition of 
the sensitivity: 

Definition 3 (Sensitivity of Exponential Mechanism (MlJ. 
The sensitivity of exponential mechanism is defined as follows: 

= max max \u(xi,r) — u(x 2 ,r)\ . (2) 

reR xi,X2:\\xi-X2\\^<l ^ ^ V Wl 

The sensitivity measures the change of utility function 
u{x,r), when one item in targeted data set changes. An 
important theorem can also be derived as 01 : 

Theorem 1. Fixing a database x, let Rqpt = 
{r G R : u(x,r) = OPTu(x)} denote the set of elements in 
R which attain utility score OPTu{x). Then, When used to 
select an output r G R, the exponential mechanism £q{x) 
ensures that: 

f’[u{x,el{x)) < inaxu(a;,r) - ^(ln(+1)] 

< exp(—t). 

B. Online Learning 

Our proposed distributed learning method derives from 
contextual bandits ll^ . This algorithm learns form the context 
information available at each time, which, in this case, is 
the users’ context vectors. Then, it keeps an index that 
weights the estimated performance and uncertainty of each 
action (recommended video or neighbor service vendor in 
this case) and choose the action with highest index at each 
time. Furthermore, the indices for the next time slot for all 
actions are updated based on the feedback received from 
the chosen action (users click feedback). There exist some 
works studying the contextual bandit ll29ll , tSOll . where the best 
action given the context is learned online. C. Tekin et al first 
proposed a distributed contextual bandit framework for big 
data classification El and social recommendations ll32ll . But 
the uniform partition method proposed in their work does not 
fit into the sparse big data. A context-aware partition method 
for big data proposed in ll3^ is a heuristic work. Nonetheless, 
the single-learner framework can not satisfy the need of the 
massive big social data. We combine adaptive context space 
partition with distributed learning, which can efficiently handle 
above difficulties. 

IV. Problem Formulation 

In this section, we first present the system model and 
assumptions. Then we give our performance metric. Finally, 
we outline the adversary model and design goals. 



Fig. 2: A general explanation of our video recommendation 
system. Each service vendor keeps a context space partition 
of arriving contexts. This partition process is dynamic by time. 

A. System Model 

The system model is shown in Fig. 2. There are M dis¬ 
tributed service vendors distributed in media cloud, which are 
indexed by set Af = {1, 2,3..., M}. They work independently 
and cooperatively in discrete time setting t = 1,2,...,T. Each 
vendor owns a set of videos. We denote the set of videos 
M.i = {kik 2 ^... fix} for service vendor i. At each time slot, 
the following events happen sequentially for service vendor 

1) a user’s extracted context vector xfit) comes to service 
vendor i\ 2) The service vendor i chooses one video from his 
repository Mi or sends the context vector to neighbor service 
vendor j, who will select one video from Mj for the user with 
this context; 3) At the end of each time slot, the user’s click 
feedback fk,xi{t)if) (If user clicks, it equals one, otherwise 
zero, where k is the recommended video.) is observed; 4) The 
service vendor i learns from the feedback, then promotes the 
selection strategy for next user. 

We describe the details and some reasonable assumptions 
here. 

1) Each service vendor has access to only its own video 
repository. Service vendors are selfish in the sense that, they 
do not reveal their repositories to other service vendors. But 
they know the number of videos of other service vendors. 
In this article, we assume every service vendor possesses K 
videos. 

2) The context information Xi{t) of the data is a high¬ 
dimensional vector. Each coordinate of the vector represents 
the feature of the user (e.g., gender, hobby, profession and 
age). We use the hypercube A' = [0,1]^ to denote its range, 
where d is the dimension of the space. Given the setting of 
big data, d is extremely large and those context vectors are 
distributed non-uniformed in the hypercube space. 

3) At the end of each time slot, we use a random variable 
//c,a:(f) to represent the reward (user click feedback) produced 
by the recommended video k. If user clicks the recommended 
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video k, it equals one, otherwise zero. Let the expected 

reward of a video conditional on the context x. Different 
videos have different expected reward for the same context. 
We aim to find the video with the highest expected reward for 
that context. Naturally, similar contexts have similar expected 
reward with the same video. We use the Lipschitz condition 
to describe this similarity: 

\ Uk,xi - Uk , X 2 \ < L\\xi - X 2 \\'^. (4) 

The goal of the service vendor is to try its best to rec¬ 
ommend video with highest expected reward. Consequently, 
if the service vendor does not have matchable video to its 
coming user’s context, he will forward the context to neighbor 
service vendor. Our algorithm chooses another service vendor 
by comparing the average rewards of each service vendor with 
those of its own videos. To be reasonable, in this distributed 
contextual bandit framework, we call ICi = AdiOAd-i the set 
of arms (videos and other service vendors) of service vendor 
i, where M-i = A4 — {i}. 


B. Performance Metric 

Definition 4 (Optimal Arm). Our benchmark when evaluat¬ 
ing the performance of the learning algorithm is the optimal 
solution, which selects the arm k with the highest expected 
reward from the set ICi = AAiflAA-i given context Xt at time 
t Specifically, the optimal arm we compare against is given 
by: 

k* (xt) = argmaxMfc,a;j,Va:t G X. (5) 

kCiKLi 


Knowing the optimal solution means that learner i (service 
vendor i in this case) knows the arm in ICi that yields the 
highest expected accuracy for each Xt G A’. 

Definition 5 (The Regret of Learning). We define the regret 
as a performance measure of the learning algorithm used by 
the learners. Simply, the regret of a learning algorithm for 
learner i is the reward gap between optimal arms and selected 
arms: 




k*{xt),xt 


E 




( 6 ) 


where k{t) denotes the video or neighbor service vendor 
chosen at time t, k*{xt) denotes the best choise for context Xf. 
Regret gives the convergence rate of the total expected reward 
of the learning algorithm to the value of the optimal solution. 


C. Adversary Model and Design Goals 

As similar privacy concern for the users’ sensitive context 
data in a, we consider a adversary model as follows: (1) Ma¬ 
licious third party who can gain access to the recommendation 
outputs and own some side information such as location about 
some users. The goal of this malicious third part is to deduce 
a particular user’s features by observing the recommendation 
outputs. Then, they can identify the media user in the real 
world with deduced features and additional side information. 
(2) Selfish and curious service vendors who want to infer 
neighbors’ repositories from shared information. For example, 
the curious service vendor forward a sports fan’s context to 


a neighbor service vendor, who output a video and receive 
high reward. Then, the curious service vendors know that this 
neighbor service vendor owns a video about sport. 

To address the adversary models above, we proposed a 
differentially private learning algorithm. Our scheme achieves 
privacy protection and performance guarantees as follows: 

• Users' Privacy Guarantee: Even if the malicious party 
can gain access to the recommendation outputs, it is 
less likely for he to infer the user’s feature from the 
recommended result. And we prove that our proposed 
algorithm can preserve ^-differential privacy for user’s 
privacy. 

• Service vendors' Privacy Guarantee: The curious service 
vendor can not distinguish the video of neighbor service 
vendors by shared information. The proposed algorithm 
can preserve ^-differential privacy for service vendors. 

• Performance Guarantee: Our proposed algorithm can 
guarantee the regret in equation ( 6 ) is sublinear con¬ 
verged, i.e., R{T) = 0{T^) such that 7 < 1 . A smaller 
7 will result in faster convergent rate. In the following 
section we will propose a private distributed learning 
algorithm with sublinear regret. 

• Privacy-Reward Trad-off: Our analysis shows that the 
higher level the privacy is preserved, the lower the 
total reward is received. By varying the value of the 
privacy parameter 5 , we can keep a trad-off between the 
total recommendation reward and the privacy preservation 
level. 

V. Differential Private Distributed Online 
Learning Algorithm for Cloud Based Video 
Recommendaion 

Since the reward of each recommended video for different 
users have unknown stochastic distributions, the natural way 
to learn a video’s performance is to record and update its 
sample mean reward for the same context vector. Using such 
an empirical value to evaluate the expected reward is the basic 
approach to help the service vendors to learn. However, the 
context space A’ can be very large, recording and updating the 
sample mean reward for each context are scarcely possible. 
The memory capacity of the sever can not meet the need of 
keeping a sample mean reward for all contexts. To overcome 
the difficulty, we dynamically partition the entire context space 
into multiple smaller context subspaces (according to the 
number of arriving users). Then, we maintain and update the 
sample mean reward estimates for each subspace. This is due 
to the fact that the expected rewards of a video are likely to 
be similar for similar contexts. 

In our distributed framework, each service vendor i E A4 
dynamically partitions the context space A’ when context Xi (t) 
arrives to them. To better understand the proposed P-DAP 
algorithm, we apart it into two algorithms, i.e.. Algorithm 
1 and Algorithm 2. Service vendor i runs Algorithm 1 to 
select video or request neighbor service vendor’s help for 
its own user. Because service vendor i does not outward 
recommendation revenue gain to other service vendors, we 
only need to protect user’s privacy and we adopt exponential 
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mechanism in Algorithm 1 (named as ExP-DAP) to achieve 
this protection. When service vendor i receives users’ extracted 
context vectors forwarded from other service vendors, it runs 
Algorithm 2 (named as LaP-DAP) to select videos and protect 
the privacy of selected videos. Two algorithms are carried out 
simultaneously, although we describe them separately. 

Next we present our online learning algorithm. In section 
VI, we will refine the proposed algorithm to geometric differ¬ 
ential privacy to reduce the performance loss. 

A. Algorithm Description 

In this subsection, we describe our differentially Private 
Distributed learning with Adaptive context space Partition al¬ 
gorithm (P-DAP for short) for video recommendation. We first 
introduce several useful concepts for describing the proposed 
algorithm. 

• Context subspace. A context subspace C is a subspace of 
the entire context space A',i.e., CCA’. In this paper, all 
context subspaces are created by uniformly partitioning 
the context space on each dimension. Thus, each context 
subspace is a d-dimensional hypercube with side length 
being m~\ where m is number of segmentations of each 
dimension to be partitioned and I is the partition level. 
To be specific, when we assign m = 2, d = 1 and entire 
space is [0,1]. then the entire context space [0,1] is a 
level-0 subspace, [0,1/2) and [1/2,1] are two level-1 
subspaces etc. 

• Active context subspace. We define a set named 
in which all existing subspaces is collected, and 

is changing over time. For example, when d = 1, 
{[0,1]}, {[0,1/2), (1/2,1]} are two sets of active context 
subspaces. A context subspace C is active if it is in the 
current context subspace set P^, i.e. C £ P^. 

• Notations. For service vendor i and each active context 

subspace C £ P^, the algorithm maintains a counter 
^kci^) I'ecording the number of times when k is se¬ 
lected for contexts belong to subspace C. r^c'(f) es¬ 
timates the sample mean reward of video k for the 
context subspace C up to time t. We have = 

T^x{t)ec The algorithm also maintains a 

counter M^{t) that records the number of context arrivals 
to C up to time t. 

To begin with, we present our Algorithm 1 in the following 
3 phases: Phase 1: Exploration and Reward Estimation 



Fig. 3: A process of dynamic partition of context space 


Algorithm 1 ExP-DAP for service vendor i’s own user 
I: Input: k £ ]Ci\ m, p, A, AT, e. Art, Gi(f), G2(t), Gfft). 
2: Initialize: P^ = {A}, 4 ^<(0) = 0, VA: G Ku M/. (0) = 0, 

A| c(0) = 0,/ = 0 
3: for f = 1,..., T, Xi(f) G G do 
4: if 3k £ Mi, such that < Gfft) then 

5: Select k and observe fl c'(f). 

6: else if 3k £ M-i, such that < KG 3 {t) then 

7: Forward Xi{t) to service vendor k. 

8: else if 3k £ M-i, such that < G 2 (t) then 

9: Forward Xi{t) to service vendor k and receive 

10: else 

11: for all k £ Mi do 

12: r[select k] = exp ' 

13: end for 

14: Select ki £ Mi according to computed probability 

distribution. 

15: Select kj £ M-i such that kj = argmax 

keM-i 

16: Call k such that 4 C'W = fep W’W) • 

17: end if 

18: Update M^j (t), 

19: if > AmP then 

20: Partition G. 

21: end if 

22: end for 


Upon each context data arrival, service vendor i first checks 
to which subspace G in the set P^ the context belongs and 
the level of G. To get accurate performance estimation of each 
arm k £ Mi, service vendor i needs to judge whether k has 
been fully explored (line 4, 5). Since service vendor i does 
not know the performance service vendor k's videos, it needs 
to send neighbor service vendor k some context samples to 
train it and make sure it will mostly select optimal video. The 
Nl ^ (j{t) denotes the times when k £ M-i is selected for 
training. In the training process, service vendor i dose not need 
to communicate with service vendor k to observe the reward 
fk xit) (^) service vendor k £ M-i has been 

fully trained, service vendor i start to explore the performance 
of leaner k £ M-i and observe the reward of each k (line 8, 
9). The control function Gi(t), G 2 (t) and G 3 (t) ensure that 
video is selected sufficiently many number of times so that the 
sample mean estimates accurate enough. And we 

set different control function for k £ M-i and k £ Mi, i.e., 
G 2 {t) is larger than Gi{t). Because for k £ M-i, the reward 
r\ (j{t) is added with noise, we need more times to evaluate 
performance of k £ M-i. 

Phase 2: Decision with Privacy Protection 

For subspace G, when all arms have been fully explored, 
there are accurate sample mean estimations for each arm. In 
traditional bandit algorithms, the learners (service vendor in 
this case) usually select the arm with the highest sample mean 
reward. However, the optimal arm will expose the individual 
feature. Thus, to protect the user’s privacy, service vendor i 
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first randomly choose one arm ki e Mi according to the 
computed probability distribution, where Au is the sensitivity 
of exponential mechanism (line 11-14). Then, it select another 
arm kj G M-i with the highest estimated reward. Finally 
service vendor i compare the estimated reward of kj and ki, 
then it select the one with higher estimated reward for context 
Xiit) (line 15, 16). We will prove this randomly selection 
scenario guarantee ^-differential privacy in next our analysis 
section. 

Phase 3: Update and Partition the Context Subspace 

At the end of each time slot, the algorithm first up¬ 
dates (t), fi r(t) where (t) = 

Mbit) + .1, Nbcit) = Nlcib) + 1 and r-^(f) = 

Hx{t)ec the algorithm decides whether 

to further partition the current subspace C, depending on 
whether we have sufficient context vectors arrivals in C. 
Specifically, if Mb{t) > AmP^ at time t, C will be further 
partitioned, where p and m are positive numbers. When 
partitioning is needed, C is uniformly partitioned into 
smaller hypercubes. Each hypercube is a level-(/-|-l) subspace 
with side-length 1/m of that of C. Then C is removed from 
the current context set P^. New subspaces are added into P^. 
Fig. 3 provides us an illustration of this partition process when 
m = 2, d = 2. 

Then, we describe Algorithm 2 as follows. In our problem 
setting, in order to protect the privacy of neighbor service 
vendors, we face a big challenge that traditional differential 
privacy only apply to static database. By contrast, the datasets 
we want to protect are dynamically releasing over time. 
In detail, suppose at every time step t G [T], one entry 
from dataset D, fk,xit) ^ {0^ 1} arrives and the task is to 
output vt = fk,x{T) while ensuring the complete output 

sequence (7;i,..., vt) is ^-differential private. To overcome this 
challenge, we use a tree based aggregation method initially 
proposed by Dwork ll36l . Chan ||35l. 

Tree based aggregation. Assume for simplicity that T = 
2^ for some positive integer a. We create a binary tree, i.e., 
Treek for each video k G Mi with its leaf nodes being 
/i, •••, /t- As illustrated in Fig. 4, at each time slot, when new 
reward is produced, we insert the value of the reward into the 
leaf node. Over the entire time sequence [T], the rewards are 
inserted sequentially. Each internal node x inT ree^ stores the 
sum of all the leaf nodes in the tree rooted at x. First notice 
that one can compute any Vf using at most log(T) nodes of 
Treek. Second, notice that for any two neighboring datasets D 
and D' different in leaf node fi and f/ at most log(T) nodes 
in Treek gets modified. So, if we fiatten the complete tree as 
a vector then for any neighboring datasets D and D' one can 
easily show that ||Tree(P) — Tree{D')\\^ < log(T). We will 
further bound the amount of the noise added to each tree in 
section V when evaluating the performance of our algorithm. 

LaP-DAP Description. When service vendor i receives 
context Xj{t) from service vendor j, service vendor i first 
determines the subspace C to which this context belongs and 
the level I of it. Then we want to make sure whether each video 
k ^ Mi has been selected for enough times for accurate esti¬ 
mation (line4-6). If each video has been explored sufficiently. 


Algorithm 2 LaP-DAP for other service vendors’ users 
I: Input: k G Mp, m, p, A, T, e, G^{t). 

2: Initialize = {A}, I = 0, Mb (0) = 0, VA: G Mi, 
lc(0) = 0,iV*_c(0)=0,F*,^(,)(0) = 0 
3: Create empty binary tree Treek with T-leaves, \/k G Mi. 
4: for t = 1,T, Xj{t) e C do 
5: if 3k G Mi, such that < G 3 {t) then 

6: Select k and insert observed fl c'(t). 

7: else 

8: Select k* = argmaxT^ c (l) ^^^d observe x (t)i^)’ 

keMi ’ ’ 

9: Insert + Lap{^) to Treek* 

10: end if 

11: Update (t), Nl c{t). 

12: if > AmP^ then 

13: Partition G. 

14: end if 

15: end for 


we select the video /c* with highest accuracy and observed the 
reward Because after the training process, service 

vendor j can gain access to this observed reward of service 
vendor i and make evaluation based on it. To preserve the 
privacy of service vendor i regarding this information, we add 
Laplace noise with deviation A = Klog{T)/s to xi{t)(^) 
(line 7-9). Finally we update some counters and judge whether 
to partition the G as described in Phase 3 (line 11-14). 



O O •••00 ••• O ---O 


fl fi fj f T fr 

Fig. 4: An illustration of tree-based aggregation. Tree{D) and 
Tree{D') are two databases that differ in one component. 

B. Algorithm Analysis 

The properties of the proposed algorithm are analyzed in 
this subsection. For simplicity of presentation, we replace 
service vendors with learners. We prove that the regret is 
sublinear converged over the time, and our P-DAP guarantees 
differential privacy. 

1) Regret Bound: For each subspace G, let Uk,c = 
^xip^^fjUk^x and Uk c ~ ^^^xecUk,x* Let x* be the context at 
the center of the hypercube G. We define the optimal arm for 
subspce C as /c* = argmaxi4/e,a^*. Then the suboptimal arms 

keiCi 

for learner i in subspace C can be written as follows: 

Ss,z,B = {fc : , (7) 

where 5 is a constant and a > 0. We will bound B to get 
optimal solution. The regret in (2) can be written as the sum 
of three components: 

R (T) < Ro (T) + R, (T) + i?„ (T), (8) 
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where Rq (T) is the regret due to selecting suboptimal arms 
from Mi by time T, Rs{T) is the regret due to selecting 
suboptimal arms from M-i and Rn (T) is the regret of near 
optimal selections by time T. Next, we bound each of these 
terms separately. 

Theorem 2. For every level-l context subspace C, with 
control function Gi (t) = nn?'^Hn{T), the expected regret due 
to choosing suboptimal arm k ^ Mi, will be bounded as 
follows: 

E [Regie + ^ 

9 r.m 

+-^- [ln{K)Fln{T)]. (9) 

Proof: The regret of ^ (T)] is due to: 1) inherent 

gap of bandit algorithm between the optimal selections and the 
suboptimal selections; 2) the gap between approximately op¬ 
timal reward applying exponential mechanism and suboptimal 
selections (line 11-14 in Algorithm 1): 

_ IP 

E (T)] < E ^ ^ ,x{t) ~ 

— ^ ,x(t) '^k,x{t)) 

+ {'^k,x{t) — '^£l{x{t)),x{t)) 

= E [Regie (^)] + E [Regie (T)] . (10) 

Next, we will bound the two part of the ^ (T)] 

separately: 

Lemma 1. The inherent regret gap of bandit algorithm 
between optimal arms and suboptimal arms ^[Re^^ c (^)] 
bounded as follows: 

E [Regie (U] < + Y' 

Proof: We denote Ek{T) the number of times that sub¬ 
optimal arm k is selected by time T. For x G C, let Auk,c = 
Uk*,c — ILkc the gap of reward between suboptimal arm 
k and optimal arm /c* in subspace C. As initially defined, the 
regret of choosing suboptimal arm k is the expected number of 
times when k is selected times the gap of mean rewards. That 

is E[Regle{T)] = EL Y(T) ■ < ELAG) 

for Auk^e < 1- Inequality (11) results from the fact that 
Ep{T) will not be larger than nn?^Hn{T) with the high 
probability. Now we discuss the result in inequality (11) under 
two circumstance. 

Casel. Ek{T) < im?^Hn{T). Under this circumstance, (11) 
holds correctly. Now we focus on case2. 

Case2. Ek{ti) = m‘^*^Hn{T) when U < T. Then we have 

T 

Re^^ e (T) < I {k is picked at time f) 

t=i 

< m^^Hn (T) -h ^ picked at time t). 

Next we will figure out the probability that k is selected 
under Case2. 

When t > m^^Hn (T), if /c is selected, we have rp^e {t) ^ 
this inequality holds when at least one of the 


following holds: 


rk,c{t) > Uk,C + Ht , 

(12) 

rkucit) < Ukuc - Ht, 

(13) 

rk,c{t) > rkuc{t). 

(14) 


'^k,c(f) ^ ^k,C Hf') "^k* Uik* ,C Et' 

Then the probability when suboptimal arm k is picked can 
be written as follows: 


¥[k is picked D Case2.] 

< ^[Tk,c{t) > + Ht] 

+ IP[r/eYcW < Uk-^c - Ht] 

E'P[rk,c(f) ^ rk*,c{t),rk*,c{t) < hk,c E Hf, 
rk*,c{t) >Up.^e-Ht]- 

We denote ^(t) the set of rewards of arm k in subspace 
C. Let be the event that at most ^ samples in 

^{t) are collected from suboptimal process functions of 
the k-\h arm. Different from classical finite-time bandit theory, 
these samples are not identically distributed. Enlightened by 
ED, in order to facilitate our analysis of the regret, we also 
generate two different artificial i.i.d. processes to bound the 
probabilities related to k G Mi. The first one is the 

best process in which rewards are generated according to a 
bounded i.i.d. process with expected reward up^e^ the other 
one is the worst process in which the rewards are generated 
according to a bounded i.i.d. process with expected reward 
Up e- Let Vp^ei^) denote the sample mean of the ^ samples 
from the best process and rp^e^^{z) denote the sample mean of 
the 2: samples from the worst process. Thus, combining (7), for 
any suboptimal arm we have: P(f/c ,c{t) > rk*,c{t),rk,c{t) < 
Uk,c+Ht,fk>,c{t) > u^,^e{t)-Ht) < 

< Uk,c + L{^y + 

Ht + rr:G(ki.,c Wl) > - L{^r - Hu case2)). 

Since k is 3 . suboptimal arm, we have Up* e ~ > 

Bm~^\ and : 

rr:s\\wi.,cm > - hu 

Vk c < Uk,c + L{ —j-)“ + H—. 

m’' n 


Given the condition: 

+ 2-< 0, (15) 

m'' n 

we have : 


fi;c\\wicm <7r:c\\wicm 


which implies that suboptimal arms will hardly be selected by 
time: 


P[r/c,c(t) > rkec{t),rk,c{t) < gk,c + Ht,rk*,c{t) > Uk*,c ~ H] 
= 0. (16) 
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In Case2. we have n > nn?^Hn{t). In order to make (15) 
hold, we assign B > 2L{^)^ + 4, a = Hf = 

Then, we have: 

F[f,,c(t) > »,o + iJ.l < = i, 

< a-.o - Hi < 

Thus, we have: 


P(/c is picked C^Case2.) < P(r/c,c(^) ^ '^/c,c + Ht) 

+P(r/c*,c(t) < - Ht) 

< 2 
- 

then, 

E[Regl^{T)] < m^-Hn{T) + F 

■ 

Before we derive Lemma 3, we provide a bound on the 
sensitivity of exponential mechanism. 

Lemma 2. The sensitivity of exponential mechanism is 
bounded is follows: 


Au < Lm-'^K 


(17) 


Proof: In our framework, xi and X 2 are two input 
data (users’ context vectors), which differ on at most one 
component. The unity function u{x,k) represents the recom¬ 
mendation reward depending on input context x and output 
video k. By Definition 3 and inequality (4), we have 


Au = max max \q{xi,k) — q{x 2 ,k)\ 

keMi Xi,X2'-\\xi—X2\\i<^ 


= max max 

keMi Xi,X 2 '-\\xi—X 2 \\i<l 

< max max 

keMi Xi,X 2 '-\\xi—X 2 \\-^<l 


'^k,X2 I 

L\\x-x'f < Lm.-°‘K 


Combining Lemma 2 and Theorem 1, we can derive Lemma 
3 as follows: 

Lemma 3. The regret due to the near optimal reward when 
applying exponential mechanism can be bounded as follows: 


which holds with a probability less than Then, we have: 

E [Re^/e^c(T')] ^ ^ ('^fe,aj(t) ~ '^e^{x{t)),x{t)) 

T 

< ^ [Ag] .P[A(7 < —{ln{K) + ln{T))] 

1 ^ 

T 

1 ^ 

< - {ln{K) + ln(T)), 

where Ag = u{x,k)-u{x,el{x)) = Uk,x{t)-Uei(x(t))Mt) < 
+ ln{T)) denotes the regret bound of exponential 
mechanism selection at each time slot. ■ 

Combining Lemma 1, Lemma 2 and inequality (10), our 
Theorem 2 holds. ■ 

The above Theorem 2 implies that for k G Mi, the proposed 
algorithm make sure the suboptimal arms will be selected more 
than with very small probability. 

Lemma 4. For k G M-i, with control function G 2 {t) = 

m^^Mn(t) + and Gi (t) = m?^Hn(T), we have the 

regret of choosing suboptimal k in subspace C by time T as 
follows: 

E[Reglc{T)] < 2m"“*/n(T) + ^(1 + ^)Zn(T), (19) 

where T is the near maximum value of the amount of total 
noise added by time T. We will bound T in Lemma 5. 

Proof: When we add Laplace noise to each time reward, 
our estimate of the actual reward will be disturbed and our 
number of times that need to be played until finding the 
optimal arm will be increased. But we demonstrate that, after 
each arm being trained Gi(t) times, there will be no more 
than im?^Hn{T) + times to be tried before finding the 

optimal arm with a high probability. 

For k ^ M-i, WQ define k is the supoptimal arm, and /c* is 
the optimal arm for subspace G. At t-th time slot, suboptimal 
arm k is selected over /c* if fk,c{t) > rk*,c{t) is true. Here, 
the reward fk,c{t) > ^k* ,c (^) is the virtual reward that include 
with noise for subspace G of arm k. Thus, we denote Rk ,c{t) 
the true reward of arm k for subspace G. Then suboptimal 
arm k is selected, only if the following holds: 

It can be easily shown that (17) is true, only if one of the 
following equations holds: 


2T 

E [Regie (T)] <-^- [In (K) + In (T)]. (18) 

Proof: At each time slot, we do not choose the arm with 
highest reward. Instead, we assign each arm a probability to be 
chosen. Thus, at each time slot, there exists the gap of reward 
when applying the randomly selection. By using Theorem 1, 
in inequality (6), we have \R\ = K, \Ropt\ = 1 (we only have 
one optimal arm). Then, we set t = ln{T). Thus, at each time 
slot, we have the regret by randomly selection as follows: 

u{x, k) - u{x, Suix)) < + ln{T)), 


Rk,c{^) ^ ^k,c + Ht, (21) 

Hk*,ci^) — Ek*,c ~ Ht^ (22) 

Rk,c{^) < ^k,c + Ht, Rk*,c{^) > Ek*,c ~ Ht, 

As we have discussed above for /c G we also denote 
best process and worst process to bound the probabilities. 
Then, we have 

'tk,c{\wlc{t)\) < Uk,c + L{E.r + Ht + -, 

m’' n 
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<r‘(K..c(oi) > - Ht), 

^fc,C (|w^fc,c(0l)+Trt- 77\ — ^k*,c (|Wfc»,c(0l) + 


n.cit) 




When kissi suboptimal arm, we have Uk*,c~'^k,c > Bm 
Together imply that: 


2L( 


Vd. 


+ 2Ht + 




+ 2 - - < 0 . 


For n > Nl Ht = ^ and B > 2L{^)^ + 4, then, 
we draw the conclusion that (20) holds when the following 
holds: 






- < 0. 


Then we come to a conclusion that when 


> 


rin?^Hn{T) + the inequality (20) can not hold, (we 

use 5'^(t) denote this case), directly by the use of Chernoff 
bound, we can show that : 

1 


P(Rfc,c(i) > kk,c + Ht) < 

(24) 

P(-Rfe-,c(i) < - Ht) < 2. 

(25) 


Let (j{t) be the event that at most ^ samples in w\. ^(t) 
are collected from suboptimal process functions of the k-i\\ 
arm. Obviously for any k G A4i, 0\ ^(t) = ft, while this is 
not always true for k G M-i. Combining (17) and (18), for 
k e M-i, we have: 






For k G Mi obviously we have P(0^ <^(t)^ = 0) . 
For k G M-i, let denote the random variable, the 

number of times suboptimal function m of for arm k is chosen 
when event 5'^(t) holds. We have , S^{t)} = 

{Y^^{t) > a}. Applying the Markov inequality, we have 

¥{Olcitf,Shit)) < Let Eicit) be the event 

that a suboptimal processing function m G A4/c is called by 
learner k, when it is invoked by learner i for the t-th time, we 
have 

(t) 


and 




F[Elcit)] < E ^irm,c{t)>r*c^{t)) 

meMn 


After each video m e Mk has been fully explored by 
Gs (t) = m‘^^Hn{t)lK times, as we have proofed in Lemma 
1, we have 


¥{Elc{t))< E 2e-2(^*)^ 

meMk 

Together imply that 


2K 




2K 


micit)] < E nEicit')) 


K 


t' = l 


t=l 


Therefore, from the Markov inequality we get 

^ . ElYjirit)] TT^ 

¥{Ol,{tf,Shit)) < Y ^ ^-e^-lniT). 

CL o CL 

Then, for arm k G M-i, we have 

T 

E[Reglc{T)] < E I(k is picked) 

t=i 

V ^ 

< 2rn°‘hn{T) + -m“*/n(r) + 

t=l 

< 2m^“'/n(T) + ^m“*/n(r) 

T 

+ y] [nok.cit), Shit))+noi,c{t)h w)] 

t=i 

< 2m^“*Zn(T) + ^m“*Zn(r) + pe^(l + ^)ln{T). 


Lemma 5. For all arms k E M-i and all time step t G [T], 
w.p. > 1 — (j (over the randomness), the amount of noise F 
added in the total reward for k till time t is at most |A^/c(t)| < 
^log {T)iog{eTiog{T)/(j) ^ where 0 is the number of arms belong 
to M-i. 

Proof: For the ease of notation, let Rk{t) be the true 
total reward for arm k until time t. As discussed above, 
d^k{t) = "kkit) — Rk{t) is a sum of at most log(T) Laplace 
distributed random variables Lap{^-^^j^). By the tail prop¬ 
erty of Laplace distribution, we know that for a given random 
variable x ^ Lap{\), with probability l-if, |x| < Alog(l/(/^). 
So, with probability at least (1 — (p/log(T))^°^^^^ < 1 — 
\Nk{t)\ < iY)iog{iog{T)/tp) ^ union bound over 

all /c-arms and all time step T and setting cp = cf/{0T), we 
have w.p. > 1 — cr, for all k G M-i and for all t G [T], 

< 9\og‘^(T)log(9Tlog(T)/a) ^ 

Lemma 6. The regret due to choose near-optimal arms 
Re^g<(T) in each level-l subspace is bounded as follows: 


Re £/ g ( T ) < ( 26 ) 


Proof: Due to the definition of near-optimal arms, regret 
due to selecting a near-optimal arm is at most Bm~^K Because 
there could be at most Am^^ slots for a level-/ subspace 
according to the partitioning rule, the regret of this part is 
at most ■ 

Now, we combine the results in Lemma 4, Lemma 6 and 
Theorem 2 to obtain the complete regret bound. The regret 
depends on the context arrival process and hence, we let 
Hi{T) denote the number of level-/ subspaces that have been 
activated by time T of learner i. Before we derive Theorem 

6, we provide a bound on the highest level of active subspace 
by time . 

Theorem 3. The complete regret of our private distributed 
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learning algorithm is bounded by 

RiT) < EkeM. E, HUT) ■ 


- {ln{K) + ln{T))\ 

+ Efc6>,_, E* HI{T) ■ [2m^“*/n(T) + 

+ (1 + fMU)] 

+ EiHl{T) ■ 

(27) 

Proof: Combining the result of Lemma 1 and Lemma 3 
it is easy to see that Ro{T) is bounded as follows: 

RoiT)<^HiiT) E ‘E[Re9lc(T)] 

l kEM-i 

< E Hi{T) • (M - 1) \m'^°‘Hn{T) + ^ 

I L 

+ (IniK) + ln{T))] . 


be the maximum level subspace under this scenario. Because 
there must be some time T' <T when all subspaces are level 
subspaces, we have 

< T, 

where is the maximum number of level I subspaces and 
AmE is the maximum number of time slots that belong to 
a level I subspace. Thus, we have /max < ^ + 1 • 

Combining this conclusion with the regret bound in Theorem 
3, we get Corollary 2. ■ 

We have shown that the regret upper bound of our private 
distributed learning model is sublinear in time, implying our 
computing service vendors can select optimal videos by time. 
Also, fast convergence to optimal is favorable to dynamically 
changing big data environments. 


By applying Lemma 4, the Rs{T) is bounded by 

E ^HiiT)-E[Reg%.c(T)] 

keM-i i 

< E E HI{T) ■ [2m^‘^Hn{T) + 

keM-i I 

(1+f MU)] • 

Finally, Rn{T) is bounded by 

Rn{T) < Y^HtiT) ■ E[RegUT)] < Y Hi{T) ■ 

I I 

Theorem is resulted of the summing of above three equation. 

■ 

The following corollary establishes the regret bound when 
the context arrivals are uniformly distributed over the entire 
context space. This is the worst-case scenario because the 
algorithm has to learn over the entire context space. Before 
we derive Corollary 2, we provide a bound on the highest 
level of active subspace by time. 

Lemma 7. Given a time T , the highest level of active 
sub space is at most [log^(^)/P] + 1. 

Proof: It is easy to see that the highest possible level of 
active subspace is achieved when all requests by time have the 


2 ) Differential Privacy: We finally prove that our algorithm 
can preserve privacy of user’s contextual information and the 
that of each service vendor’s videos. 

Theorem 4. The Algorithm 1 can preserves (^,0)- 
differential privacy for user’s contextual information. 

Proof: Let xi and X 2 be two input context vectors 
that differ in one single attribute, p denote the reward of 
exponential mechanism, R denotes the output (sequence of 
selected videos) space of exponential mechanism. Then R = 
We suppose that the same user’s data 
stream has come for N times over time arbitrary sequence 
as a result, our algorithm selected an arbitrary 
sequence of arms such that Me{xi^ R) = {A)i, A)2 5 •••, 
at the time sequence. We denote p{xi,ki) the mean reward 
of arm ki for context xi at time U. In our algorithm ki) 
equals fki,c{ti)- C is the active subspace to which the context 
xi belongs at time. If xi and X 2 belong to the same subspace 
C at time U, then p{xi,ki) = p.{x 2 ,ki). We construct a 
function /(ti, xi, X2). When xi, X 2 belong to the same active 
subspace, the value of the function equals one, otherwise zero. 
We consider the relative probability of our algorithm for given 
context xi and X 2 ' 


same context. This requires < T. Therefore, /max = 

riogm(5)/^l +1- ■ 

Corollary 2. If the context arrival by time T is uniformly 
distributed over the context space, and we set the partition 
parameter p much larger than similarity parameter a we 
have: 

R (T) < Ro (T)^-h Rs (T) -h Rn (T) 

< (£) W . (K + M-l) 

+ .^c*+p-ar 

. rti'^ ■ ^{K + (M - l)e^) 

+ + ln{T)) + ^{M - l)e^ln{T)]. 

(28) 


[ME:{x2,fJ‘,R) = {ki,k2,...,kN}] 

ill E ^ E 

k'eR ' k'eR 


E , s' \x{x 2 ,k') \ 
- 2Am ^ 

TT if^ixi,ki)-r^ix2,ki)) ^ / k'eR _\ 

2A, ) ( ^ 

k'eR 

N , E 

TT {k^{.xxXi)-r^{x 2 ,ki)) \ _ ( k'eR _\ 

11 2A/^6 J V / s' ,k') ^ 1 

k'eR 


< n exp ( Y • I{ti, xi,X 2 )) • exp (• /(ti,xi,X 2 


E ) 

k'eR 


N 


Proof: First we calculate the highest level of subspace = fl exp(s'•/(fi,xi,X2)) 


when context arrivals are uniform. In the worst case, all level 
/ subspaces will stay active, and then they are deactivated until 
all level-(/ -f 1) subspaces become active and so on. Let /max 


i=l 

< exp{Ns') 
= exp(£). 
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Thus, the theorem follows. ■ 

Theorem 5. The Algorithm 2 can preserve (^,0)- 
dijferential privacy for service vendors' videos. 

Proof: For k ^ Ai-i and subspace C, let [T'] = 
T'} denotes the sequence of time slots that videos is 
selected for simplicity, where T' < T. let = (/i, /t) be 
a data set of true rewards. We call a data set D' neighbor of 
D if it differs from D in exactly one reward. We define Ft{C) 
the virtual outcome (reward with noise added), then we have, 
at each round, the probability of same outcome for different 
arm ki and k 2 '. 

P[Mi(fei,t)=Ft(C)] _ 

V[ML(k2,t)=Ft{G)] 

= exp - A(C')| - |/*(fci) - Ft(C)l)) 

< exp{-^\ft{ki) - ft{k2)\) 

= e.xp{^\\ft{ki) - Mk2)\\i) 

< exp{e'). 

In our problem model, the proposed algorithm only accesses 
the reward for its computation via the tree based aggregation 
scenario. Learner i maintains M — 1 trees for other learner’s 
reward sets respectively. Each tree guarantee e' = e/{M — 1) 
differential privacy. With the composition property stated in 
Corollary 1, we can draw the conclusion that our algorithm 2 
is ^-differential private. ■ 

Theorem 4 shows that the attributes (e.g., social status, 
hobby and age) in users’ sensitive context vectors cannot be 
inferred from the recommended results. The proof of Theorem 
5 supports that the service vendors fail to extract information 
about videos in neighbor service vendors’ repositories by the 
rewards. In summary, our Theorem 4 and Theorem 5 prove that 
the proposed algorithm P-DAP can preserve the both privacy 
of users and service vendors synchronously. 

VI. Geometric Differential Privacy 

In the previous section, we preserve privacy to the same 
extend for all subspaces. That is to say, we set the same 
value of 5 for the whole context space. This section presents 
our refined geometric differentially private model. Considering 
the sparsity and heterogeneity of big data, some context 
subspaces are scattered with countless data points, however, 
other subspaces are nearly blank. A large and increasing 
number of statistical analyses can be done in a differential 
private manner while adding little noise. As also declared 
in n, “the larger the dataset, the less a given amount of 
blurring will affect utility”. Thus, our geometric differential 
private algorithm varies the amount of noised add to subspaces 
according to the size of each subspace. To be specific, we 
decrease the privacy level (larger value of e) when the density 
of datasets increased (1 denotes the density of subspaces). In 
this way, the performance loss due to the randomness brought 
by differential privacy can be reduced extensively. For current 
active subspaces, we set different value of 5 related to the 
density I of them. Specifically, we increase the value of 5 
when I increases. Fig. 5 gives an illustration of this method. 
For simplicity, we take the one-dimensional context space for 


instance. Leaf nodes presented in Fig. 5 are current active 
subspaces, we set different value of 5 related to the density I 
of each subspaces. 



Fig. 5: An illustrative example of geometric private model: 
For simplicity, we assume dimension of context space d = 1. 
The left segment shows the partition pattern. The right tree 
structure shows the partition process, where blue leaf nodes 
denote the active subspaces. Subspaces with different level-/ 
get different value of 5. 


The modified method works as follows. After we get enough 
context samples, we already have accurate estimations for 
rewards. From now on, for each context arrival, we first figure 
out to which subspace it belongs. Then we judge the level / of 
the subspace and set 5 = eom^^ for level-/ subspaces, where 
m and a are constants as we have defined previously. 

Theorem 6. Geometric differential privacy has a lower 
regret bound than uniform differential privacy as follows: 


R 


:«(T) < R{T) - ((5)“m“ - l) Gi(5) 


+A2{^)‘‘+^ 


d-2oi 

d+p jYid—2a 


(29) 

where Ai and A 2 are two constants. When time T goes into 
infinity, the value of the second term on the right side of 
the inequality will increase exponentially. Thus, the result of 
Theorem 6 proves that our geometric differential privacy has 
greatly reduced the regret bound. 

Proof: We set 5 = eom^^ and the amount of noise T = 
in the geometric differential privacy method. Thus, we 

have: 


< R{T) - Ebr (M - 

- Etr K {ln{T) + ln{T)) ■ - 2^) 


For simplicity, we use Ai and A 2 denote 
and ^ respectively. Here the Theorem 6 holds. ■ 


VII. Experimental Results and Analysis 

In this section, we demonstrate the theoretical regret bounds 
for our algorithms with empirical results based on very large 
real-world datasets, which includes massive multimedia data 
and social media users-generated big data. We show that: 
1) regret bounds are sublinear converged over time; 2) Our 
differentially private methods work well and do not come 
at the expense of recommendation accuracy; 3) Geometric 
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differentially private method has a lower regret bound and 
higher accuracy. Finally, we use users’ context vectors refined 
from real datasets to test the recommendation accuracy of our 
algorithms. 

A. Experimental Setup 

To evaluate the performance of our recommendation system, 
training data and test data about users and videos should be 
gathered. We collect numerous user context vectors extracted 
from large real datasets in Sina Microblog, a popular online 
social networking site in China. This datasets contain users’ 
social profiles and multimedia content they shared. We also ex¬ 
tract public information from Youku, a prevalent video sharing 
site (VSS) in China, such as video attributes, popular videos. 
After preprocessing, around 74000 video items, 578000 user 
context vectors with 13900-dimension are stored. 

For simplicity, we deploy the recommendation system on 
a small-sized framework with four distributed video service 
vendors. Using collected video data, we constructed a set of 
1000 videos for each service vendor respectively. Following 
the real situation, we arrange different video items for different 
service vendors. We randomly sample 200000 users ( context 
vectors) from our stored datasets, and input these vectors 
to our simulative recommendation system sequentially. When 
receiving user arrival, service vendor selects a particular video 
to recommend. At the end of this time slot, the reward of this 
selection, a binary random number (equal 0 or 1), is produce, 
to imitate the result of user’s click action. Since our scheme 
appertains to the class of online distributed learning tech¬ 
niques, we will compare our scheme against several previous 
approaches: 

• Centralized learning with adaptive partition (CAP) (3^ : 
There is only one learner in this centralized framework 
who partitions the context space dynamically over time 
according to the number of user arrivals. 

• Distributed learning with uniform partition (DUP) llUl : 
This distributed framework contains multiple cooperative 
learners. But all of them uniformly partition context space 
initially. No partition process is involved over time. 

• Distributed learning with adaptive partition (DAP): This 
is the primal model of the proposed P-DAP. Multiple 
learners in this distributed framework adaptively partition 
the context space over time (No privacy preservation in 
this model). 

Finally, to thoroughly analyze the performance of our pro¬ 
posed algorithms, we logically deploy our experiment by the 
following 4 steps: 

Step 1. We first compare our primal model DAP with 
previous work, i.e., CAP and DUP |[3l1 . We input sampled 
200000 users’ context vectors sequentially into these three 
models respectively. That is to say, each model will receive 
same input datasets with 200000 elements. We plot the regrets 
and the average regrets (to evaluate the convergence rate) 
of each model. Afterwards, we extracted four groups (with 
different size) of user context vectors from collected real 
datasets. Then, we input these four groups context vectors into 
CAP, DUP and DAP to test the performance of each model. 



(a) Regrets (b) Average Regrets 

Fig. 6: Regrets in CAP, DUP and DAP 


N 

CAP 

DUP 

DAP 

5000 

70.32% 

73.32% 

87.34% 

10000 

75.36% 

74.52% 

90.21% 

20000 

76.88% 

74.78% 

91.02% 

50000 

77.34% 

75.08% 

92.17% 


TABLE II: Average accuracies of DAP, CAP and DUP 


Step 2. We construct our differentially private model (P- 
DAP) based on step 1. As for each vendors’ own user, arms 
(videos and other service vendors) are randomly selected 
according to computed probabilities. Simultaneously, Laplace 
noise is added when recommending videos to other service 
vendors’ users. To prove the smooth trade-off between privacy 
and accuracy in our P-DAP, we vary the privacy constant 5 
from 0.01 to 1 and compare them with non-private model 
(DAP). Finally, we use our extracted four groups of context 
vectors to test the accuracy of these models. 

Step 3. To prove the lower regret of geometric differential 
private method (GP-DAP), we set different value of e for 
different context subspaces. To be specific, the value of 5 
wax with the decrease of the density of data points in each 
subspace. Then, we compare the regrets of GP-DAP and P- 
DAP (5=0.01) over time. 

B. Results and Analysis 

We first evaluate DAP’s performances in terms of regret 
loss and average regret loss in Step 1. In the meanwhile, we 
compare DAP, CAP and UAP and plot the regret lines in Fig. 
6. 

Fig. 6 (a) shows the comparison with DAP, CAP and DUP 
in terms of regrets, where the horizontal axis is the number of 
user arrivals. From the tendency of “Regret” lines, we can draw 
the conclusion that the regret of DAP is sublinear converged 
over time. And obviously, DAP has lower regret loss than DUP 
and CAP all the time. Fig. 6 (b) records the average regrets 
(normalized by number of arrivals) of DAP, CAP and DUP, 
where the horizontal axis is the number of user arrivals. As we 
can see, our primal model DAP converges fast and has lower 
average accuracy then CAP and DUP. Also, results show the 
average regret of DAP in the tail of lines is extremely small 
(smaller than 0.02 per user). 

Table II records the average accuracies (total reward divided 
by number of arrivals) in our tested process, where N repre¬ 
sents the number of context vectors used by test. We find that 
as the number of arrivals increased, the average accuracies 
of each model get promoted as well. This could be resulted 
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(a) Regret (b) Average Regret 

Fig. 7: Regrets in P-DAP and DAP 



(a) Regret (b) Average Accuracy 

Fig. 8: Regrets and Accuracies in P-DAP and GP-DAP 



P-DAP 

DAP 

N 

e=0.01 

e=0.1 

e=l 

5000 

77.54% 

77.78% 

78.96% 

86.84% 

10000 

78.75% 

79.35% 

79.24% 

89.28% 

20000 

79.92% 

80.34% 

81.06% 

90.23% 

50000 

80.12% 

81.16% 

82.12% 

91.56% 


TABLE III: Average accuracies of DAP and P-DAP 


from the fact that systems trained better as number of samples 
increased. Also, we can read from the table that the average 
accuracy of our DAP can reach up to 92%, but neither those 
of CAP nor DUP can exceed 80 %. Finally, We can draw the 
conclusion that DAP outperforms CAP and DUP. 

Fig.7 gives the simulation experiment results of P-DAP. 
Fig.7 (a) shows both the regrets of P-DAP and DAP are 
sublinear over time. To be specific, we can see from the 
tendency of regret lines that as privacy preservation level get 
increased (smaller e), regrets converged more slowly. Fig.7 
(b) shows our differentially private P-DAP has low-regret (no 
more than 0.03 per time slot) even for a high level of privacy 
preservation (e.g., e = 0.01). The regret obtained by the non¬ 
private algorithm has the lowest regret as expected. More 
significantly, the regret gets closer to the non-private regret 
as its privacy preservation is weaker. 

Table III records our tested average accuracies for DAP and 
P-DAP with different privacy preservation level. As we can 
read from the table, average accuracy of DAP can reach to 
91.56% and those of our P-DAP with different values of 5 is 
greater than 80% by time. 

Fig. 8 shows our simulation results of GP-DAP and P- 
DAP, where we set 5=0.01. From Fig. 8 (a) tells us the 
regret of GP-DAP is less than that of P-DAP by 32%. We 
can immediately draw the conclusion that GP-DAP cut the 
regret loss extensively. We also use different set of data with 
different volume to test the accuracies of P-DAP and GP-DAP. 
Fig. 8 (b) shows the comparison of these average accuracies. 
Both GP-DAP and P-DAP have high accuracy for each group, 
and the accuracies become slightly higher when increasing 
the sizes of groups. Obviously, GP-DAP always has higher 
accuracy than P-DAP. 

Table IV records the test result of GP-DAP and P-DAP 
(5=0.01) of different user groups. At first glance, the accura¬ 
cies increased slightly as we add more context samples into 
test group. This is due to the fact that, more samples can help 
systems get better estimation of each processing functions. 
Also, we can see that, the average accuracy of the GP-DAP 


N 

5000 

10000 

15000 

20000 

25000 

30000 

GP-DAP 

82.44% 

84.36% 

85.23% 

86.74% 

87.33% 

88.17% 

P-DAP 

(e=0.01) 

77.31% 

77.87% 

78.42% 

79.14% 

80.06% 

81.64% 


TABLE IV: Tested average accuracies of GP-DAP and DAP 


will be greater than 88% as number of user arrivals exceed 
30000. 


VIII. Conclusion 

In this paper, we have presented a differential private 
distributed learning framework for video recommendation for 
online social networks. To tackle with the large value and 
heterogeneity of big data, we adopt dynamic space partition 
to distributed contextual bandit. Concerned with the privacy 
of social network users and that of video service vendors, we 
use exponential mechanism and Laplace mechanism simulta¬ 
neously. Furthermore, to alleviate the performance loss due 
to introducing differential privacy, we refine our framework 
to novel geometric differentially private model. We have 
theoretically analyzed our algorithms in terms of performance 
loss (regret) and privacy preserving. We have also evaluated 
our algorithms, demonstrating their sublinear converged re¬ 
grets, delicate trade-off between performance loss and privacy 
preserving level and extensively reduction. 
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